Ishan Misra

Research Scientist @ GenAI (Meta)

I work on computer vision and machine learning research specifically in generative AI and self-supervised learning. I am a Research Scientist in the GenAI group at Meta where I lead the research efforts on video generation models.

Previously, I was part of the FAIR team at Meta where I worked on self-supervised learning in computer vision and multimodal learning.

For my work in self-supervised learning, I was featured in the MIT Tech Review’s 35 innovators under 35 list (compiled globally across technological disciplines). You can hear me on Lex Fridman’s podcast for an overview of my work.

I got my PhD at Carnegie Mellon University with Martial Hebert and Abhinav Gupta. My PhD Thesis received the SCS Distinguished Dissertation Award (Runner Up). I received CMU's Recent Alumni Achievement Award in 2024 for my research contributions to computer vision and machine learning.

News

2024 September

Awarded Carnegie Mellon University's Recent Alumni Achievement Award

2024 July

Mark Zuckerberg announces the release of Llama3 (with our efforts on video recognition).

2024 July

Talk at ELLIS Workshop on Open Problems in Computer Vision & Generative Modelling at Munich, Germany

2024 July

Talk at Oxford Machine Learning Summer School at the University of Oxford, UK

2024 March

Talk at ELLIS Winter School on Foundation Models at Amsterdam, Netherlands

2024 June

4 papers accepted at CVPR

2024 June

Emu Video now powers "animate" on meta.ai that converts images to videos!

2024 June

Llama3 is released!

2023 Nov

Mark Zuckerberg announced our recent project Emu Video

2023 May

Mark Zuckerberg announced our recent foundational multimodal model ImageBind

2023 April

Mark Zuckerberg announced our recent foundational self-supervised model DINO-v2

2022 April

Keynote talk at the Ghost Day ML Conference

2021 March

Blog on self-supervised learning the dark matter of intelligence with Yann LeCun

Publications

Mainly publish on video and image understanding, video and image generation, object detection/segmentation, multimodal learning, and self-supervised learning.

The Llama 3 Herd of Models

The Llama3 Team (played role of a Core Contributor for video recognition)

arxiv 2024

PDF Code

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Rohit Girdhar^ *, Mannat Singh^ *, Andrew Brown *, Quentin Duval *, Samaneh Azadi *, Sai Saketh Rambhatla , Akbar Shah , Xi Yin , Devi Parikh , Ishan Misra *

ECCV 2024

PDF BibTeX Powers Meta's /animate product *Authors contributed equally

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

Feng Liang , Bichen Wu , Jialiang Wang , Licheng Yu , Kunpeng Li , Yinan Zhao , Ishan Misra , Jia-Bin Huang , Peizhao Zhang , Peter Vajda , Diana Marculescu

CVPR 2024

PDF BibTeX Highlight

InstanceDiffusion: Instance-level Control for Image Generation

Xudong Wang , Trevor Darrell , Sai Saketh Rambhatla , Rohit Girdhar , Ishan Misra

CVPR 2024

PDF Code BibTeX

Generating Illustrated Instructions

Sachit Menon , Ishan Misra , Rohit Girdhar

CVPR 2024

PDF BibTeX

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Xudong Wang , Ishan Misra , Ziyun Zheng , Rohit Girdhar , Trevor Darrell

CVPR 2024

PDF BibTeX

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Mannat Singh *, Quentin Duval *, Kalyan Vasudev Alwala *, Haoqi Fan , Vaibhav Aggarwal , Aaron Adcock , Armand Joulin , Piotr Dollár , Christoph Feichtenhofer , Ross Girshick , Rohit Girdhar , Ishan Misra

ICCV 2023

PDF Code Colab BibTeX *Authors contributed equally

MOST: Multiple Object localization with Self-supervised Transformers for object discovery.

Sai Saketh Rambhatla , Ishan Misra , Rama Chellappa , Abhinav Shrivastava

ICCV 2023

PDF Code BibTeX Oral

MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses

Yang Fu , Ishan Misra , Xiaolong Wang

ICML 2023

PDF BibTeX

ImageBind: One Embedding Space To Bind Them All

Rohit Girdhar *, Alaaeldin El-Nouby *, Zhuang Liu , Mannat Singh , Kalyan Vasudev Alwala , Armand Joulin , Ishan Misra *

CVPR 2023

Demo PDF Code Demo BibTeX Highlighted paper *Authors contributed equally

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Xudong Wang , Rohit Girdhar , Stella X. Yu , Ishan Misra

CVPR 2023

PDF Code BibTeX

Learning Video Representations from Large Language Models

Yue Zhao , Ishan Misra , Philipp Krahenbuhl , Rohit Girdhar

CVPR 2023

PDF Code Colab BibTeX Highlighted paper

The Hidden Uniform Cluster Prior in Self-Supervised Learning

Mahmoud Assran , Randall Balestriero , Quentin Duval , Florian Bordes , Ishan Misra , Piotr Bojanowski , Pascal Vincent , Michael Rabbat , Nicolas Ballas

ICLR 2023

PDF BibTeX

OmniMAE: Single Model Masked Pretraining on Images and Videos

Rohit Girdhar *, Alaaeldin El-Nouby *, Mannat Singh *, Kalyan Vasudev Alwala *, Armand Joulin , Ishan Misra *

CVPR 2023

PDF Code BibTeX *Authors contributed equally

Masked Siamese Networks for Label-Efficient Learning

Mahmoud Assran , Mathilde Caron , Ishan Misra , Piotr Bojanowski , Florian Bordes , Pascal Vincent , Armand Joulin , Michael Rabbat , Nicolas Ballas

ECCV 2022

PDF Code BibTeX

Detecting Twenty-thousand Classes using Image-level Supervision

Xingyi Zhou , Rohit Girdhar , Armand Joulin , Phillip Krahenbuhl , Ishan Misra

ECCV 2022

PDF Code BibTeX

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Priya Goyal , Quentin Duval , Isaac Seessel , Mathilde Caron , Ishan Misra , Levent Sagun , Armand Joulin , Piotr Bojanowski

Arxiv 2022

PDF

Omnivore: A Single Model for Many Visual Modalities

Rohit Girdhar *, Mannat Singh *, Nikhila Ravi *, Laurens van der Maaten , Armand Joulin , Ishan Misra *

CVPR 2022

PDF Code BibTeX Oral *Authors contributed equally

Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng , Ishan Misra , Alexander G. Schwing , Alexander Kirillov , Rohit Girdhar

CVPR 2022

PDF Code BibTeX

An End-to-End Transformer Model for 3D Object Detection

Ishan Misra , Rohit Girdhar , Armand Joulin

ICCV 2021

PDF Code BibTeX Oral

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron , Hugo Touvron , Ishan Misra , Hervé Jégou , Julien Mairal , Piotr Bojanowski , Armand Joulin

ICCV 2021

PDF Code

Self-Supervised Pretraining of 3D Features on any Point-Cloud

Zaiwei Zhang , Rohit Girdhar , Armand Joulin , Ishan Misra

ICCV 2021

PDF Code BibTeX

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

Aishwarya Kamath , Mannat Singh , Yann LeCun , Ishan Misra , Gabriel Synnaeve , Nicolas Carion

ICCV 2021

PDF Code Oral

Audio-Visual Instance Discrimination with Cross-Modal Agreement

Pedro Morgado , Nuno Vasconcelos , Ishan Misra

CVPR 2021

PDF Code BibTeX Best Paper Candidate

Robust Audio-Visual Instance Discrimination

Pedro Morgado , Ishan Misra , Nuno Vasconcelos

CVPR 2021

PDF BibTeX Oral

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Jure Zbontar *, Li Jing *, Ishan Misra , Yann LeCun , Stéphane Deny

ICML 2021

PDF Code BibTeX *Authors contributed equally

3D Spatial Recognition without Spatially Labeled 3D

Zhongzheng Ren , Ishan Misra , Alexander G. Schwing , Rohit Girdhar

CVPR 2021

PDF

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Mathilde Caron , Ishan Misra , Julien Mairal , Priya Goyal , Piotr Bojanowski , Armand Joulin

NeurIPS 2020

PDF Code BibTeX

Self-Supervised Learning of Pretext-Invariant Representations

Ishan Misra , Laurens van der Maaten

CVPR 2020

PDF Code BibTeX

ClusterFit: Improving Generalization of Visual Representations

Xueting Yan *, Ishan Misra *, Abhinav Gupta , Deepti Ghadiyaram *, Dhruv Mahajan *

CVPR 2020

PDF Code BibTeX *Authors contributed equally

In Defense of Grid Features for Visual Question Answering

Huaizu Jiang , Ishan Misra , Marcus Rohrbach , Erik Learned-Miller , Xinlei Chen

CVPR 2020

PDF Code BibTeX

3D-RelNet: Joint Object and Relational Network for 3D Prediction

Nilesh Kulkarni , Ishan Misra , Shubham Tulsiani , Abhinav Gupta

ICCV 2019

PDF Code BibTeX

Scaling and Benchmarking Self-Supervised Visual Representation Learning

Priya Goyal , Dhruv Mahajan , Abhinav Gupta *, Ishan Misra *

ICCV 2019

PDF Code BibTeX *Authors contributed equally

Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding

Hexiang Hu , Ishan Misra , Laurens van der Maaten

ICCV Workshop on Vision and Language 2019

PDF Code BibTeX

Does Object Recognition Work for Everyone?

Terrance DeVries *, Ishan Misra *, Changhan Wang *, Laurens van der Maaten

CVPR 2019

PDF BibTeX *Authors contributed equally

Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing

Angela Jiang , Daniel L.-K. Wong , Christopher Canel , Ishan Misra , Michael Kaminsky , Michael Kozuch , Padmanabhan Pillai , David G. Andersen and Gregory Ganger

USENIX Annual Technical Conference 2018

PDF BibTeX

Learning by Asking Questions

Ishan Misra , Ross Girshick , Rob Fergus , Martial Hebert , Abhinav Gupta , Laurens van der Maaten

CVPR 2018

PDF BibTeX Oral

Cut Paste and Learn: Surprisingly Easy Synthesis
for Instance Detection

Debidatta Dwibedi , Ishan Misra , Martial Hebert

ICCV 2017

PDF Code BibTeX

From Red Wine to Red Tomato: Composition with Context

Ishan Misra , Abhinav Gupta , Martial Hebert

CVPR 2017

PDF Code BibTeX Oral

Shuffle and Learn: Unsupervised Learning
using Temporal Order Verification

Ishan Misra , C. Lawrence Zitnick , Martial Hebert

ECCV 2016

PDF Code BibTeX

Seeing through the Human Reporting Bias:
Visual Classifiers from Noisy

Ishan Misra , C. Lawrence Zitnick , Margaret Mitchell , Ross Girshick

CVPR 2016

PDF Code BibTeX

Cross-stitch Networks for Multi-Task Learning

Ishan Misra *, Abhinav Shrivastava *, Abhinav Gupta , Martial Hebert

CVPR 2016

PDF BibTeX Spotlight *Authors contributed equally

Generating Natural Questions About an Image

Nasrin Mostafazadeh , Ishan Misra , Jacob Devlin , Margaret Mitchell , Xiaodong He , Lucy Vanderwende

ACL 2016

PDF Code BibTeX Oral Long Paper

Visual Storytelling

Ting-Hao Huang , Francis Ferraro , Nasrin Mostafazadeh , Ishan Misra , Jacob Devlin , Aishwarya Agrawal , Ross Girshick , Xiaodong He , Pushmeet Kohli , et al.

NAACL 2016

PDF BibTeX

Watch and Learn: Semi-Supervised Learning of Object Detectors from Video

Ishan Misra , Abhinav Shrivastava , Martial Hebert

CVPR 2015

PDF BibTeX

Applying artificial vision models to human scene understanding

Elissa Aminoff , M. Toneva , Abhinav Shrivastava , Xinlei Chen , Ishan Misra , et al.

Journal of Frontiers in Computational Neuroscience 2015

PDF

Data-driven Exemplar Model Selection

Ishan Misra , Abhinav Shrivastava , Martial Hebert

WACV 2014

PDF BibTeX Best Student Paper